WE CLAIM : 

1. A method of setting up ^ DLSI space-based classifier for document 
classification comprising the steps of: 

preprocessing documents to distinguish terms of a word and a noun phrase from 
stop words; 

constructing system terms by sitting up a term list as well as global weights; 
normalizing document vectors pf collected documents, as well as centroid vectors 
of each cluster; 

constructing a differential ter|n by intra-document matrix D^"""^ , such that each 
column in said matrix is a differential intra-document vector; 

decomposing the differenti^ term by intra-document matrix Dj , by an SVD 

algorithm, into Dj ^U^S^Vj {S^ = Hiag{5jy,dj^,'--)) , followed by a composition of 
A ~ ^k.^kjK, giving approximate Dj in terms of an appropriate kj ; 

setting up a likelihood fiinc ion of intra-differential document vector; 

constructing a term by extr i-document matrix D^"""^ , such that each column of 
said extra-document matrix is an e? tra-differential document vector; 

decomposing , by exploiting the SVD algorithm, into 

D^^U^S^VliS^^diagiS^y.S^^,' ^)) , then with a proper , defining 



^^kSkVl to approximate k; 



i on of extra-differential document vector; 



settmg up a likelihood funct 
setting up a posteriori function; and 

using the DLSI space-based dassifier to automatically classify a document. 
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b) constructing, using the doci 



2. An automatic document classii ication method using a DLSI space-based 
classifier for classifying a document in acco -dance with clusters in a database, comprising 
the steps of : 

a) setting up a document vector generating terms as well as frequencies of 
occurrence of said terms in the documeni, so that a normalized document vector N is 
obtained for the document; 

lent to be classified, a differential document 
vector x = N-C , where C is the normklized vector giving a center or centroid of a 
cluster; 

c) calculating an intra-document likelihood function P(x \ Dj) for the document; 

d) calculating an extra-document likelihood function P(x\D^) for the 
document; 

e) calculating a Bayesian posteriori probability function P(Dj | x) ; 



f) repeating, for each of the c 

g) selecting a cluster having 
document most likely belo igs; and 

h) classifying the document i i the selected cluster, 



sters of the data base, steps b-e; 
a largest P(Dj | x) as the cluster to which the 



3. The method as set forth in claim 2, wherein the normalized document vector N 
is obtained using an equation, b^j = a 



4. A method of setting up a pLSl space-based classifier for document 
classification, comprising the steps of: 
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document matrix where each column 



xtra-document matrix where each 



setting up a differential term by intra ■ 
of the matrix denotes a difference between a docupient and a centroid of a cluster to 
which the document belongs; 

decomposing the differential term py intra-document matrix by an SVD 
algorithm to identify an intra-DLSI space; 

setting up a probability function fofr a differential document vector being a 
differential intra-document vector; 

calculating the probability function according to projection and distance 
from the differential document vector to the intia-DLSI space; 

setting up a differential term by (;> 
column of the matrix denotes a differential document vector between a document vector 
and a centroid vector of a cluster which does not include the document; 

decomposing the differential tern by extra-document matrix by an SVD 
algorithm to identify an extra-DLSI space; 

setting up a probability function ^r a differential document vector being a 
differential extra-document vector; 

setting up a posteriori likelihood function using the differential 
intra-document and differential extra-document vectors to provide a most probable 
similarity measure of a document belonging to a cluster; and 

using the DLSI space-based classi^er to automatically classify a 

document 
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5. The method as set forth in claim 4, 
function for a differential document vector 
performed using an equation, 



P(x\D,)^ 



'_5.|;4 Uxp 

2^ si 



^h^Ky = Ulx,e\x)M\x\f -^yj 

i=l 

matrix Dj . 

6. The method as set forth in claim 
ftinction for a differential document vector 
performed using an equation, 



wherein the step of setting up a probability 
l)eing a differential intra-document vector is 



. 2pi 



-kj)/2 



1/2 , ^fXT^ yf ^ / ^F^^(X)^ 

4 exp(-^X^) • ex])(-^-^) 



Pi 



^Sfj , and rj is the rank of 



wherein the step of setting up a probability 
^eing a differential extra-document vector is 



P(x\D,) = 



where y = Ul^x , £\x) =|| x f -Y,yl , Pe = 



2pE 



/=1 



matrix . 



7. The method as set forth in claim 
likelihood function is performed using an equation, 

P{x\D,)P{D,) 



^E /=Jtp+l 



X^^.' ' is the rank of 



4, wherein the step of setting up a posteriori 



P(D,\x) = 



P{x\D,)P{D,) + P{x\D, 



)P{Ds) 
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where P(A) ^^^to \ln^ where 
and P{D^) is set to 1 - P(D^) . 

8. The method as set forth in claim 
classifier to automatically classify a docun 



n is the number of clusters in the database 



4, the step of using the DLSI space-based 
ent comprising the steps of: 



a) setting up a document vector by generating terms as well as frequencies of 
occurrence of said terms in the document so that a normalized document vector N is 
obtained for the document; 

b) constructing, using the docujnent to be classified, a differential document 
vector x = N-C , where C is the normalized vector giving a center or centroid of a 
cluster; 

c) calculating an intra-document likelihood function P{x | Dj) for the document; 

d) calculating an extra-document likelihood function P(x \ D^) for the 
document; 

e) calculating a Bayesian posteriori probability function P{Dj \ x) ; 

f) repeating, for each of the cluster 

g) selecting a cluster having a larg 
document most likely belongs; and 

h) classifying the document in the selected cluster. 



s of the data base, steps b-e; 

gest P{D^ I ;c) as the cluster to which the 



9. The method as set forth in cl 
vector N is obtained using an equation, b^j = 



:aim 8, wherein the normalized document 
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